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Extending approximate Bayesian computation 
methods to high dimensions via a Gaussian copula 

model 

J. Li* * D. J. Nott*, Y. FarL and S. A. Sisson^ 


Abstract 

Approximate Bayesian computation (ABC) refers to a family of inference methods used 
in the Bayesian analysis of complex models where evaluation of the likelihood is dif¬ 
ficult . Conventional ABC methods often suffer from the curse of dimensionality, and 
a marginal adjustment strategy was recently introduced in the literature to improve 
the performance of ABC algorithms in high-dimensional problems. The marginal ad¬ 
justment approach is extended using a Gaussian copula approximation. The method 
first estimates the bivariate posterior for each pair of parameters separately using a 
2-dimensional Gaussian copula, and then combines these estimates together to esti¬ 
mate the joint posterior. The approximation works well in large sample settings when 
the posterior is approximately normal, but also works well in many cases which are 
far from that situation due to the nonparametric estimation of the marginal posterior 
distributions. If each bivariate posterior distribution can be well estimated with a low¬ 
dimensional ABC analysis then this Gaussian copula method can extend ABC methods 
to problems of high dimension. The method also results in an analytic expression for 
the approximate posterior which is useful for many purposes such as approximation of 
the likelihood itself. This method is illustrated with several examples. 

Keywords: Approximate Bayesian Computation (ABC), Gaussian copula, Likelihood 
free inference, Marginal adjustment, Regression adjustment ABC. 


1 Introduction 


Part of the class of “likelihood-free” techniques, approximate Bayesian computation (ABC) 
methods are commonly implemented to draw samples from an approximation to the poste¬ 
rior distribution when the likelihood function is computationally intractable. This scenario 


arises in an increasingly broad range of discipline areas (Beaumont et al. 2002 Bortot et al. 


2007, Drovandi and Pettitt 2011b). 


Denote the prior for a parameter vector 9 = (0 1 , • • • ,9 P ) T C Q p as p(9), the computa¬ 
tionally intractable likelihood function as L(y\9), and the resulting posterior distribution 
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as ir(8\y 0 bs) oc L(y o b s \6)p(0 ), for observed data y 0 b s ■ The same basic mechanism underlies 
most ABC algorithms. For each of i = 1,..., N candidate draws from the prior distribution, 
$W ~ p(0), an auxiliary dataset y® ~ L(y\9^) is sampled from the data generation process 
given 9^\ Suppose that s = S(y) is a vector of summary statistics with dim(s) < dim(y), 
and that s 0 b s = S(y 0 b s ). If ||sW — s ofes || is small, for some distance measure || • ||, then 
could credibly have generated the observed summary data s Q b s , so 0^ l > is a possible draw 
from Tr(0\y o b s ). Conversely, if ||s^ — s 0 b s \\ is large, then 9^ is unlikely to have generated 
the observed data, so 0 W is not likely to be a draw from the posterior. Specifically, the 
resulting samples (0(d jS (O) are draws from the joint distribution 

TTh BC {0, s\s obs ) oc K h (\\s - s obs \\)L(s\8)p(9), (1) 

where Kh is a standard smoothing kernel with scale parameter h > 0. A simple importance 
sampling ABC algorithm describing this simulation process is given in Table [lj Note that 
direct evaluation of the intractable likelihood function is circumvented. 


Input: 

An observed dataset, y 0 bs- 
A desired number of samples N > 0. 

An importance sampling distribution f(9), with f{9) > 0 if p{9) > 0. 

A smoothing kernel Kh and scale parameter h > 0. 

A low-dimensional vector of summary statistics s = S(y). 

Compute s obs = S(y obs )- 

Iterate: 

For i = 1,..., N: 

1. Sample a parameter vector from importance distribution ~ m- 

2. Simulate a dataset from the likelihood y ® ~ L(y\9^) given parameter vector 9^\ 

3. Compute the summary statistics s W = S'(yW). 

4. Weight each sample by oc Kh(\\s^ — s 0 bs\\)p(9^) / f (8^). 

Output: 

A set of i = 1,..., N samples (8^\ s^) with weights w®, drawn from n^ BC (8 : s|s 0 & s ). 


Table 1: A simple ABC importance sampling algorithm. 


Integrating out the auxiliary summary dataset from 0 results in the ABC approxima¬ 
tion to the posterior 


TTh BC (9 1 s 0 bs) oc J K h (\\s - s obs \\)L(s\9)p(9)ds. 


( 2 ) 
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This distribution has the property that if S(y) is sufficient for 9 , and if h —> 0 then 
lim/^o BC (@\ s obs) = tt(^| Hobs): so tluat the exact posterior distribution is recovered. How¬ 
ever, in practice sufficient statistics are typically unavailable for intractable models, and in 
simulations ^ s a b s (so that h > 0) in all but trivial settings. As a result, tt^ bc (9\s 0 b s ) 
will only approximate the posterior in general. For further details on ABC models and 
alternative sampling algorithms see e.g. Beaumont et al. (2009), |Sisson et al. (2007)[ 
Marjoram et al. (2003), |Drovandi and Pettitt (2011a) 


One of the primary restrictions in the application of ABC methods in general is that 
they suffer from the curse of dimensionality (Blum 2010). Casual inspection of ([2]) indicates 
that ABC methods are based on a kernel density estimate of the likelihood function. Kernel 
density estimation is well known to be reliable only in low dimensions. Here the relevant 
dimension is in the comparison of s with s 0 b s (not to be confused with the univariate 
quantity ||s — s 0 bs||)- As dim(s) > dim(0) is required for reasons of parameter identifiability, 
this means that ABC methods perform poorly in models with even a moderate number 


of parameters. In practice, it is not uncommon that dim(s) >> dim(0) (Allingham et al. 
Bortot et al. 2007), so ■k^ bc {0\s o b s ) can be a poor approximation of n(6\y 0 b s ) even 


2009 


for low dimensional models. 

In some circumstances, the curse of dimensionality problem can be circumvented. This 
can occur where the intractable likelihood function factorises in some way (Bazin et al. 


2010, White et al. 2015 Barth e l m e and Chopin 2014). For example, suppose that L(y\0) = 


Y\j L(y(j)\Q) where represents some subset of y, and that conditional simulation from 
L{y(j)W) is possible; in this case, the comparison of s and s Q b s can be directly reduced to 
multiple lower dimensional (even univariate) comparisons. However, these approaches are 
problem specific, and are not suitable for usage with general, non-factorisable models. 
More generally applicable methods have been proposed, such as the regression and 


marginal adjustments (Beaumont et al. 2002; Nott et al. 2014). The regression adjustment 
takes advantage of the lack of an exact match between s W and s D b s by constructing a 
regression model to capture the relationship between the parameter vector and the summary 
statistics. 


Beaumont et al. (2002) introduced the weighted linear regression model 


9 ^ — OL + /3 T (s^ — S 0 bs ) + 

where a is a p x 1 vector, j3 is a q X p matrix of regression coefficients (where q = dim(s)) 
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and £$ are zero-mean iid errors, and where the weight for the pair is given by 

KhiWs^ — s 0 6 S ||). Writing the least squares estimates of a and (3 as a and /3, and the 
resulting empirical residuals as £,, the linear regression adjusted vector 

6®* = 0®-P T (8®-8o b3 ) = a + § i 


is approximately a draw from 7r(0|s o ft s ) = lim^o BC (0\s o bs) if the assumptions of the 
regression model hold. More flexible non-linear, heteroscedastic regression adjustments have 


been developed (Blum and Francois 2010; Blum et al. 2013). The regression adjustment 
can work well in improving the ABC posterior approximation, however it only mitigates, 


rather than removes the underlying curse of dimensionality problem (Nott et al. 2014). 


The marginal adjustment (Nott et al. 2014) first constructs estimates of ir(d\s 0 b s ) using 
regular ABC with regression adjustment, and precise estimates of the univariate marginal 
posterior distributions 7r(#i|s 0 f>s,(i)) for i = 1,... ,p, where s 0 ft Sj (j) is a subset of s 0 b s infor¬ 
mative for 0i. The marginal posterior of 6i can often be estimated well, due to the reduced 
dimensionality of the marginal summary statistic s 0 b s o). The marginal distributions of the 
initial estimate of ir(d\s 0 b s ) are then adjusted to be those of the more precisely estimated 
marginals, through an appropriate replacement of order statistics. The final adjusted pos¬ 
terior can be a substantial improvement over standard ABC and regression adjustment 


methods (Nott et al. 2014). While the marginal adjustment in itself avoids the curse of 
dimensionality problem, and can be applied to analyses with non-factorisable likelihood 
functions, the dependence structure within the initial estimate of 7r(0|s o ft s ) is not adjusted, 
and so the final marginally adjusted sample can have a very poor dependence structure. 

In this article we propose a new method for constructing an ABC approximation to 
the posterior distribution that can be easily implemented in high dimensions, well beyond 
current ABC practice, while maintaining a viable dependence structure. Our approach 
is based on constructing a Gaussian copula to approximate the dependence structure of 
7r(0|s o ft s ), and on using the ideas behind the marginal adjustment to maintain full flexi¬ 
bility in representing the univariate margins. The p-dimensional dependence structure of 
the Gaussian copula can be efficiently determined from the Gaussian copula dependence 
structures estimated from all bivariate parameter pairs ( 9i,0j ). As such, an advantage of 
this approach is that it plays to existing ABC method strengths: namely in only estimating 
low-dimensional (bivariate and univariate) posterior distributions. The copula approach 
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accordingly overcomes the curse of dimensionality inherent in standard ABC methods, per¬ 
mitting the estimation of posterior distributions with viable dependence structures, for 
arbitrarily large p-dimensional parameter vectors. 

This article is structured as follows: In Section [2] we introduce the Gaussian copula, 
and describe our proposed ABC method in detail. A simulated example and two real data 
analyses are presented in Section [3j The first real data analysis, based on the multivariate 
g-and-A; distribution, estimates a p = 184 dimensional posterior distribution, which is, 
in principle, comfortably beyond the capabilities of any previous ABC analysis. Higher 
dimensional analyses could have been considered. The second real data analysis focuses on 
robust Bayesian variable selection, and illustrates how copula ABC can outperform both 
standard ABC and regular exact Bayesian inference even in moderate-dimensional analysis 
(here p = 17) in a discrete posterior setting. Section [4] concludes with a discussion. 


2 Gaussian copula ABC 


According to the classical Bernstein-von Mises theorem (Van der Vaart 2000), under stan¬ 


dard regularity conditions, the posterior distribution n(9\y 0 b s ) is asymptotically normal. 
This motivates the use of structured density estimation models for ABC which contain the 
multivariate normal. In particular, we consider the meta-Gaussian family of distributions 


(Fang et al. 2002), which model dependence through a Gaussian copula, as we describe 


further below. Meta-Gaussian densities have the property that the p-dimensional joint den¬ 
sity can be reconstructed from all bivariate marginal densities. In the present setting, if 
bivariate marginal posterior densities can be well estimated using low-dimensional ABC 
analyses, then meta-Gaussian approximations to these densities can be combined into a 
meta-Gaussian approximation of the full posterior distribution. As it is constructed from 
well estimated marginal densities, the resulting posterior approximation would avoid the 
ABC curse of dimensionality problem, and can be expected to perform favourably compared 
to existing ABC approaches in high dimensional models. 

Suppose that the random vector 6 = (0 1 ,..., 9 P ) T has a continuous multivariate density 
g(-), with univariate marginal densities (/,(•) and marginal distribution functions G'j(-) for 6i , 
i = 1,... ,p. The copula C of 9 is defined as the joint distribution of U = (U\,.. ., U P ) T = 
( G\(9i ),..., G p (9 p )) T , and it contains full information on the dependence structure among 
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the components of 6. Sklar’s theorem (Sklar 1959) states that the multivariate density can 


be written as g{6) = C(G\{9\),... ,Gp(8 p )), which permits a decoupling of the modelling 
of the copula and the univariate marginal densities in order to model the joint density (e.g. 
Joe 1997]). 

Define 77 = (771,... , r] p ) T with 77,; = <h~ 1 (G',:($*)), for i = 1 ,... ,77, where <f> is the standard 
normal cumulative distribution function. If 77 is multivariate normal, 77 ~ IV(0, A), then the 
copula of 6 is called a Gaussian copula, and 9 has a meta-Gaussian distribution with density 
function given by 

9 ( 0 ) = j^^exp j^T7 T (I- A" 1 )77|lJ^(6» i ), (3) 


where I denotes the identity matrix. 

The multivariate normal family is embedded within the family of meta-Gaussian distri¬ 
butions. Writing (/>(■) as the standard normal density function, then the univariate normal 
distribution has density function f(x 1 ) = where u 1 = ■ For a 77- 

dimensional normal distribution with mean 7 / = (fii ,..., g p ) T and covariance 

matrix E, and writing oj = (wi, • • • ,w p ) T with Wj = Xi ~^ i for i = 1 ,... , 77 , then the joint 
normal density of x = (rci,..., x p ) T can be expressed as 


/(*) 


(^jET eXP l“^ (X “ '‘ )TE_1(I “ M) } 



(4) 


where R is the corresponding correlation matrix of E. Observe that R in Q corresponds 
to A in meaning that the correlation matrix of the Gaussian distribution is exactly the 
correlation matrix of the corresponding Gaussian copula. 

In the ABC setting, if approximate normality of 7 r( 0 |s o 6 s ) holds, possibly after marginal 
transformations of the parameters, then we may utilise a Gaussian copula model to esti¬ 
mate the dependence structure of vr(0|s o fo s ) in light of Q. As previously noted, all bivariate 
marginal densities completely determine the joint density in a meta-Gaussian distribu¬ 
tion, and these bivariate densities can usually be easily and precisely estimated in low¬ 
dimensional ABC analyses. As such, it will be possible to obtain a reliable estimate of the 
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joint posterior 7r(0|s o ;, s ), even in high-dimensional problems, something which is in principle 
comfortably beyond the capabilities of current ABC methods. In essence, we propose to 
estimate each bivariate density using low-dimensional ABC methods, approximate these 
with a 2-dimensional Gaussian copula, and then combine them to obtain an approximate 
joint posterior using ([3]). 

More precisely, the procedure we propose is as follows: 

1. For each pair (i,j) with i = 1, • • • ,p — 1 and j = i + 1,... ,p: 


(a) Identify the summary statistics as a subset of s which are informative for 

(0i, 6j). 

(b) Use conventional ABC methods to draw an approximate sample 9^,..., 9^ 
from 7r(#|s(jj)). Extract the ( i,j) th components from 8^\ ..., 6^ to form an ap¬ 
proximate sample (8- 1 \9j 1 ^), ..., (9\ n \ 9^) from the bivariate marginal n(9i, 9j\s^^). 

(c) Let r^\ ..., r-"' ) be the ranks of 9 ^\..., 9[ n \ and qf\ ..., qf^ 1 be the ranks of 


3 (i) 


, . (£) (7) 

,9, . Set r/> = and for £ = l,...,n. 


j 


(d) Calculate the sample correlation of ( 77 ^, 77 ^),..., ( 77 -”^, 77 ^) and denote it Aj. 


f= A 




2. For i = 1,... ,p: 


(a) Identify the summary statistics so) as a subset of s which are informative for 0,,. 

(b) Use conventional ABC methods to draw an approximate sample 9 ^ l \..., 9^ n '^ 

from 7r(0|s(j)). Extract the i th component from ..., 9^ n " > to form an approx¬ 
imate sample 6 ^,..., ^ from the univariate marginal 7r(0j|s(j)). 

(c) Use density estimation methods to approximate the marginal density g-i(9i ) (de¬ 
noted gi(9i)) based on 9^\ ..., #*”'). 

3. Combine all Ajj’s to form the p-dimensional correlation matrix A with diagonal ele¬ 
ments 1. The final Gaussian copula estimate of ir(9\s 0 b s ) is obtained via Q with A 
estimated by A and gi(9i) estimated by g%(9i) for i = 1 ,,p. 

The above algorithm is easy to implement, and is computationally efficient as the calcu¬ 
lations in Steps [l] and [2] can be performed in parallel for each i, j. While there is no restric¬ 
tion on the types of ABC methods used to draw approximate samples from 7r(#j, 9j\suj^) 
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and 7r(0j|s(j)), one possible efficient implementation could be to estimate all bivariate and 
univariate marginal densities using importance sampling (Table [l]) using the same large 
initial sample {9^\s^) ~ L(s\9)p(9) for l = This approach is common in the 

ABC literature (e.g. Nunes and Balding 2010| |Blum et al. 2013 Prangle et al. 2014), 
and is one we adopt in the analyses of Section |3j Alternatively, separate samplers could be 
implemented (in parallel) for each univariate and bivariate margin, although at potentially 
higher computational overheads. 

A key element in the accurate estimation of the bivariate and univariate marginal den¬ 
sities is the identification of suitable subsets of statistics so u and syy While this may 
initially seem difficult, it is not uncommon to be able to identify specific summary statis¬ 
tics as informative for specific parameters, particularly in some structured models (e.g. 


Drovandi and Pettitt 2011b Nott et al. 2014). However, in more general cases, established 


techniques exist for the semi-automatic construction of a single summary statistic for each 


model parameter (Fearnhead and Prangle 2012). This method is particularly useful in the 
present framework. 

As the meta-Gaussian distribution © is used as an approximation to vr(0|s o b s ), it is 
sensible to examine the quality of the final approximation. This can be achieved through 


existing diagnostic procedures for ABC approximations (Prangle et al. 2014), or during the 
construction of the copula model itself. For the latter, note that bivariate Gaussian copula 
models for each g(9i, 9j ) are available through ([3]), and can be estimated as gij(9i, 9j ) given 
gi(9i), gj{9j) and A ij. Similarly, a bivariate kernel density estimate of vr(0,;, 9j |s 0 &s), denoted 


(i) fl (iH 


gij(9i,9j), can be constructed from the samples ( 9\ ,9 


(0- n ),#j n )) in Step 


lb 


If approximate normality of the posterior holds, then the bivariate dependence struc¬ 
ture can be well described by a Gaussian copula, and hence gij{9i,9j) will provide a close 
approximation to gij(9i, 9j ). If for every bivariate pair (0*, 9j ) the Gaussian copula estimate 
(jij(9i , 9j ) provides a close approximation to gy (9 t , 9j), this suggests that the full poste¬ 
rior may be adequately modelled by a Gaussian copula. Of course, capturing all bivariate 
dependence structures well does not necessarily mean that the full joint dependence will 
be captured well. As such, some kind of application specific predictive validation of the 
approximate joint posterior may be needed. 

Finally, we note that the estimate A obtained by combining the A,;j is not guaranteed to 











be positive definite (although in all our later analyses it was). If this occurs then alternative 


procedures for constructing A can be adopted, such as the methods considered in Lpland 


et al. (2013)|, We also note that the use of a plug-in estimator for A ignores the possibility 


of large estimation errors. If this is a realistic possibility in any analysis, then a sensitivity 
analysis should be performed. 


3 Examples 

3.1 A toy example 


We first examine how the ABC Gaussian copula posterior ©> performs in a simple toy 
example, where the posterior distribution is known. The model that we consider is y ~ 
N p (9, X) for p > 2, where y = (yi ,..., y p ) T , 9 = (9 1 ,...,6 P ) T and £ = diag((j 0 ,..., a 0 ). 


For the prior we specify the ‘twisted-normal’ prior of Haario et al. (1999) with density 
function proportional to 


p(9) oc exp 



(9 2 - b9\ + 1006) 2 
2 



For p = 2, the third term in the exponent is set to be zero. This prior is essentially a product 
of independent Gaussian distributions with the exception that the component for (9\. 62 ) 
is modified to produce a ‘banana’ shape, with the strength of the bivariate dependence 
determined by the parameter b. Simulation from p{9) is achieved by first drawing 9 ~ 
fVp(0, A) where A = diag(100,1,..., 1) and then transforming 02 —> 02 + b9\ — 1006. 

For the following we specify <Jo = 1, and 6 = 0.1 to produce strong prior dependence 
between 9\ and 62 . We determine y 0 b s = (10, 0,..., 0) T as a single observed vector, and 
construct the vector of summary statistics as s = S(y) = y, the full, p-dimensional dataset. 
We exploit knowledge of the model and set = Si as the subset of summary statistics 
that are informative for with the exception of S( 2 ) = ( s i> s “i) for 02 ■ The unions of these 
informative subsets and are taken when constructing the subsets S(i.j) informative 
for the bivariate parameter pair ( 9i,9j ). 

The following analyses are based on N = 1,000,000 samples ( 9^\s ^) ~ L(s\Q)ir( 6 ), 
i = 1,..., N. In sampling from each 1- and 2-dimensional ABC posterior approximation, 
as required to construct the Gaussian copula approximation g(0) of g(9), we specify the 
smoothing kernel Kh(') as uniform over the range (—h,h) and determine h as the 0.01 
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quantile of the N observed differences between simulated and observed summary statistics 
(with different summary statistics for each marginal-posterior approximation), producing 
n = n! = 10, 000 equally weighted samples for analysis. In each case, both local linear 


regression-adjustment (Beaumont et al. 2002) and marginal adjustment (Nott et al. 2014) 
were implemented to improve the posterior approximation. Euclidean distance ||s — s 0 & s || = 
Ef=i( s * ~ s 0 bs,i j 2 ] 1 / 2 was used to compare simulated and observed summary statistics. 


(a) Rejection 


(b) Regression 


(c) Kde: g 



Figure 1: Contour plots of the (0i, # 2 ) margin of various ABC posterior approximations (black lines) 
to the p = 5 dimensional model, 7r(0|s o b s ). True contours are shown in grey-dashed lines, and contour 
levels indicate 0.1, ..., 0.9 of maximum density estimate. Standard ABC approximations consist of 
(a) rejection sampling, (b) rejection sampling with regression adjustment, (d) rejection sampling with 
marginal adjustment, and (e) rejection sampling, with regression and marginal adjustment. Panel 
(c) illustrates regression and marginal adjusted estimate < 71 , 2 ( 01 ,02) of 7r(0i, 02|s(i,2))> whereas panel 
(g) shows the copula ABC approximation < 71 , 2 ( 01 , 02 )- 


Figure [l] illustrates contour plots of various estimates of the bivariate posterior margin 
7r(0i, 02\s o b s ) (solid lines), each derived from estimates of the full distribution vr(0|s o b s ) when 
p = 5. Contour plots of the true bivariate margin are given by the grey dashed lines. The 
left column of Figure [l] shows the estimates obtained via standard rejection ABC using 
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the full vector of summary statistics s 0 b s , both without (panel (a)) and with (panel (d)) 
marginal adjustment. The univariate margins for the marginal adjustment were obtained 
from the p = 2 dimensional model. From panel (a), rejection sampling alone performs fairly 
poorly - the correlation between 9 1 and 62 is captured reasonably well, but the univariate 
margins are too dispersed. Following a marginal adjustment (panel (d)), the margins are 
corrected to the right scale, but now it becomes evident that the dependence structure is 
not perfectly estimated. 

The centre column of Figure [lj shows the same information as the rejection-based es¬ 
timates, except that a linear regression adjustment has been performed in each case after 
the rejection stage, and before the marginal adjustment. Clearly the regression adjusted 
samples (panel (b)) approximate the true posterior very well, to the extent that no further 
visual improvements are apparent following a subsequent marginal adjustment (panel (e)). 

Panel (c) displays the kernel density estimates 62 ), obtained following regression 

and marginal adjustments, but where each margin is only conditioned on the subvector 
of summary statistics sn 2 ) rather than on the full vector s 0 b s ■ That the kernel density 
estimates are largely the same as for the standard ABC analyses indicates that the subvector 
sn 2 ) is highly informative for the bivariate parameter pair, and that these are therefore 
appropriate to use when fitting the copula model. 

Panel (g) shows the fitted bivariate copula estimates 51,2 ($i 5 #2) based on ([ 3 ]). As the 
copula ABC approximation is highly similar to the kernel density estimate <71,2(#1, #2), this 
indicates that the copula model is both appropriate and accurate for these bivariate margins. 
Similar qualitative comparisons can be made for all other bivariate marginal distributions 
(results not shown), implying that the full copula model g{9) may be extended as a good 
approximation of vr(0|s o ;, s ). 

Figure [ 2 ] shows the same estimates of the bivariate margin n(9i, # 2 |s 0 &s) as Figure [T] 
except that they are derived from estimates of the full distribution vr(0|s o fe s ) when p = 50. 
In this scenario, the limitations of standard ABC methods become apparent. Due to the in¬ 
creased number of parameters, p, the rejection sampling estimate of the margin n( 6 i, 6 ^ 21 Sobs) 
is highly similar to the ‘banana’ prior distribution, vr(0). This deviation cannot be corrected 
by adjusting the margins (panel (d)). The regression adjusted estimate (panel (b)) performs 
better - it is centered on the right location, although the margins are too diffuse, and the 
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Figure 2: Contour plots of the (0 lt 0 2 ) margin of various ABC posterior approximations (black lines) 
to the p = 50 dimensional model, 7r(0|s o ;, s ). True contours are shown in grey-dashed lines and contour 
levels indicate 0.1, ..., 0.9 of maximum density estimate. Standard ABC approximations consist of 
(a) rejection sampling, (b) rejection sampling with regression adjustment, (d) rejection sampling with 
marginal adjustment, and (e) rejection sampling, with regression and marginal adjustment. Panel 
(c) illustrates regression and marginal adjusted estimate < 71 , 2 ( 01 ,02) of 7r(0i, 02|s(i,2))> whereas panel 
(g) shows the copula ABC approximation < 71 , 2 ( 01 , 02 )- 


posterior correlation has disappeared. Correcting the margins (panel (e)) improves this as¬ 
pect, although it cannot recover the lost dependence structure. In comparison, the copula 
marginal estimate 51,2 (0i, $ 2 ) retains the same accuracy as for the p = 5 dimensional model 
as it is constructed in exactly the same way. 

To illustrate more precisely the performance of each ABC posterior estimation method as 
dimension p increases, Table [ 2 ] shows the mean estimated Kullback-Leibler (KL) divergence 
between 7r(0i, ^ 2 1«o6s) and the bivariate margin of each ABC approximation, based on 100 
replicates. The number in parentheses is the standard error of this estimate. As dimension 
increases, the performance of rejection ABC deteriorates drastically, as expected. As p gets 
very large, the KL divergence will level off to that obtained by comparing 7r(0i, 02|s o fes) to the 
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p 

Rejection 

Rejection 
(Marginal adj.) 

Regression 

Regression 
(Marginal adj.) 

Copula ABC 

2 

0.058 (<0.001) 

0.040 (<0.001) 

0.043 (<0.001) 

0.035 (<0.001) 

0.039 

(<0.001) 

5 

0.807 (0.001) 

0.053 (0.001) 

0.613 (0.002) 

0.037 (<0.001) 

0.040 

(<0.001) 

10 

1.418 (0.002) 

0.100 (0.001) 

1.078 (0.002) 

0.061 (0.001) 

0.040 

(<0.001) 

15 

1.912 (0.002) 

0.292 (0.002) 

1.229 (0.003) 

0.202 (0.001) 

0.039 

(<0.001) 

20 

2.288 (0.002) 

0.450 (0.001) 

1.280 (0.003) 

0.292 (0.001) 

0.039 

(<0.001) 

50 

3.036 (0.003) 

0.520 (0.002) 

1.474 (0.009) 

0.335 (0.001) 

0.040 

(<0.001) 

100 

3.362 (0.002) 

0.524 (0.002) 

1.619 (0.013) 

0.341 (0.001) 

0.039 

(<0.001) 

250 

3.663 (0.003) 

0.515 (0.002) 

1.737 (0.015) 

0.344 (0.001) 

0.039 

(<0.001) 


Table 2: Estimated Kullback-Leibler divergence of the (0i,02) margin of various ABC posterior 
approximations to n(8i, 02|sobs), as a function of model dimension p. Numbers represent mean 
divergences over 100 replicates with standard errors given in parentheses. 


bivariate ‘banana’ prior p{ 6 1 , 0o), as the ABC estimate of the posterior becomes equivalent 
to that prior as p —>• 00 . The marginally adjusted rejection sample performs better, though 
only because it at worst maps the ‘banana’ prior to the region of high posterior density - 
it otherwise performs poorly (see e.g. Figure [2](d)) . 

The regression-adjusted estimates perform better than the rejection ABC estimates, as 
they exploit the linear relationship between 0* and s* in order to better identify the high 
posterior density region. However, even regression adjustment is known to only mitigate the 


curse of dimensionality in ABC (Nott et al. 2014). As p gets large, the best performance 
will be obtained by performing regression adjustment on the prior distribution (which is 
the limiting approximation for rejection sampling). Performing the marginal adjustment 
can improve on this, but as all dependence structure has been lost with higher dimensions, 
the best possible approximation here is a product of the independent marginal estimates 


(Nott et al. 2014). 

In contrast, the copula ABC approach is constructed from low dimensional (i.e. bivari¬ 
ate) estimates of vr(0j, 0j\s o b s ), regardless of the dimension of the full model. As such, it can 
near perfectly capture the dependence structure of all bivariate pairs of the full posterior 
distribution, which is near Gaussian in this example. That is, its performance is completely 
independent of model dimension. 


3.2 A high-dimensional, multivariate g-and-k model 


A multivariate version of the g-and-k distribution was introduced by Drovandi and Pet- 


titt (2011b). This g-dimensional distribution is constructed by linking q univariate g- 
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and-A: marginal distributions (Rayner and MacGillivray 2002), with marginal parameters 
(Ai, Bi, gi,ki) for i = 1 together with a Gaussian copula with correlation matrix 

V for the dependence structure. The univariate g-and-k distribution has no closed form 
density, but is defined through its quantile function as 


Q{q\A,B,g,k) = A + B 


1 + c 


1 - exp {-gz(q)} 


(1 + z(q) 2 ) k z(q) 


(5) 


T + exp {-gz(q)} 

for B > 0, k > —1/2, where the parameters A,B,g and k control location, scale, skewness 
and kurtosis respectively, and where z(q) denotes the g-th quantile of the standard normal 
distribution function. The parameter c measures the overall asymmetry, and is fixed at 0.8 


as a conventional choice (Rayner and MacGillivray 2002). Several ABC approaches to in¬ 
ference for the univariate g-and-k and related distributions have previously been considered 


(Allingham et al. 2009; Drovandi and Pettitt 2011b, Fearnhead and Prangle 2012; Peters 


and Sisson 2006). The univariate g-and-k distribution is very flexible, with many common 


distributions obtained or well approximated by appropriate parameter settings, such as the 
normal distribution when g = k = 0. Given (A, B, g,k), simulations z(p) ~ 1V(0,1) drawn 
from a standard normal distribution can be transformed into samples from the g-and-k 
distribution through Q. To obtain draws from the multivariate model, first draw samples 
from Ng( 0, V), and then adjust each of the q margins as for the univariate case. 

Note that the use of a Gaussian copula for the multivariate g-and-k distribution is 
completely distinct from our use of a Gaussian copula to approximate 7r(0|s o & s ) through 
Q. However, the copula construction of the multivariate g-and-k distribution does permit 
the ABC analysis of a single model type with an arbitrarily large number of parameters. The 
number of unknown parameters in this model consists of the four parameters (Aj, Bj, gi, ki) 
for each of the q univariate margins, plus q(q — l)/2 correlation parameters = Vji for 
i,j = 1 ,...,q, in the correlation matrix V = \y\ij of the g-and-A; copula. This gives 
4q + q(q — l)/2 parameters in total for the (/-dimensional model. 

The observed data consist of q = 16 foreign currency exchange log daily returns against 
the Australian dollar (AUD) for 1,757 trading days from 1st January 2007 to 31st December 


2013 (Reserve Bank of Australia 2014). Hence, our most complex model has 184 unknown 
parameters. This is considerably beyond the scope of any previous ABC analysis that does 
not rely on likelihood factorisation to perform the analysis. 


For the univariate model margins, Drovandi and Pettitt (2011b) proposed the following 
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robust summary statistics as informative for the four model parameters: 

Sa = L 2 , S k = (E 7 -E 5 + E 3 -E l )/S B , 

S B = L 3 -Lt, and S g = (L 3 + L x - 2L 2 )/S B , 

where Lj and Ej respectively denote the z-th sample quartile and j-th octile of the dataset 
y. We adopt these statistics as directly informative for each respective model parameter in 
defining s^, so that e.g. Sa is informative for A and S g is informative for g. The exception 
to this is that we specify ( S B ,Sk ) as informative for B. The dependence of B on both of 
these statistics is immediately apparent by regressing 9 on s = (Sa, S b , S g , Sk ) using the N 


samples (9^\ s^), in the mould of|Fearnhead and Prangle (2012)[ Also following 


Drovandi 


and Pettitt (2011b), we use the robust normal scores correlation coefficient (Fisher and 


Yates 1948) as the informative summary statistic for each correlation parameter between 


the z-th and j-th data margins. The unions of these informative subsets s m and su\ are 
taken when constructing the subsets informative for bivariate parameter pairs. So e.g. 
( S g ,Sk ) and (Sa, S B , S k ) are taken as informative for (g,k) and (A,B) respectively. The 
prior p(9) is defined as uniform over the support of the parameter space for the (Aj, Bi,gi,k{) 
margins and a Wishart(I 9 , q ) distribution with q degrees of freedom for V. where I q denotes 
the q x q identity matrix. 

The following analyses are based on N = 500,000 samples (9^ l \s^) ~ L(s\9)f(9), 
l = 1,..., N, where the importance sampling distribution f(9) is defined by U(— 0.1, 0.1) x 
U( 0,0.05) x U(— 1,1) x U(— 0.2,0.5) for each g-and-fc marginal parameter set (A,, Bi, gi,ki), 
and the Wishart(/ g , q) prior distribution for the correlation matrix V. The uniform range 
for each marginal parameter was determined via a pilot analysis using a moderate number of 


samples (9^\ s^), following Fearnhead and Prangle (2012) The smoothing kernel (■) is 
uniform over (—h, h ) where h is determined as the 0.01 quantile of the N differences between 
simulated and observed summary statistics ||s^ — s 0 b s ||. Mahalanobis distance was used to 
compare simulated and observed summary statistics ||s — s 0 bs|| = [(s — Sobs/So 1 ( s — Sobs)] 1 ^ 2 , 
where So = Cov(s|$o) was estimated as the sample covariance of 2000 samples from L(s\9q), 
and where 9q is determined as the vector of means of the marginal density estimates gi(9i ) 
z = 1,... ,p. 

Figure[3]illustrates contour plots of various ABC approximations of the bivariate (B±, k \) 
posterior marginal distribution. The top row corresponds to the q = 3-dinrensional model 
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with p = 15 parameters. The middle and bottom rows correspond to q = 10- and 16- 
dimensional models with p = 85 and 184 parameters respectively. Column 1 shows the 
ABC posterior approximation based on importance sampling and regression adjustment 
only. Clearly the approximation is poor, regardless of the dimension of the model, with 
the approximation becoming slightly variable as model dimension increases. The density 
estimates in column 2 are based on rejection sampling and marginal adjustment only. Here, 
while the marginal adjustment brings the posterior approximation to the right scale, the 
previously observed negative dependence structure between B\ and k\ has been lost due to 
the initially poor rejection sampling estimate (not shown). 

Column 3 of Figure [3] illustrates conventional best performance ABC: rejection sam¬ 
pling followed by both regression and marginal adjustments. In this scenario the estimated 
marginal posterior seems credible for any model dimension, displaying viable scale and neg¬ 
ative dependence structure. Column 4 shows the same posterior approximation as column 
3, except that only the subset of summary statistics = (Ss 1 ,Sk 1 ) is used in the es¬ 
timation, rather than the full p-dimensional vector, s. As the density estimate is broadly 
equivalent to that using the full vector of summary statistics, this indicates that the sub¬ 
set S(i.j) is indeed highly informative for this parameter pair. Moreover, the estimate of 
I■ s (i,j)) is more precisely estimated than the estimate of tt{B\, Aq|s), indicating that 
there is some effect on the standard ABC approximation as the dimension of the vector 
summary statistics gets large. This loss of precision is not seen when only using the sum¬ 
mary statistic subset suj\. Finally, column 5 displays the bivariate copula margin estimate 
of fci|s), which is effectively the same as the kernel density estimate in column 4. This 
indicates that the copula model provides a good approximation for this bivariate marginal 
distribution. 

What is notable in this analysis is that standard ABC methods are performing ad¬ 
mirably well, even in p = 184 dimensions. Chiefly this is due to the relationships between 
the sampled summary statistics and parameter pairs \ being highly linear, in com¬ 
bination with the structured construction of the multivariate g-and-/c model. The former 
point enables the regression adjustment to estimate the linear dependence structure between 
parameter pairs well, whereas the latter point means that the parameters (A,;, £>,, g t , k t ) of 
the z-th margin, are mostly (but not completely) determined by the data in the same mar- 
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gin. In combination with the marginal adjustment, these allow standard ABC methods 
to produce very good estimates of the posterior distribution. However, some improvement 
is still clearly being brought by the copula approach. These results imply that while the 
original paper that developed this multivariate quantile model for ABC only analysed data 


with q = 2 dimensions (p = 9 parameters) (Drovandi and Pettitt 2011b), this model is 
clearly viable for inference in much higher dimensions. 

We use this example to illustrate another advantage of the copula approach. The fit¬ 
ted copula model g(9) provides an analytic approximation to the posterior distribution, 
7r(0|s 0 fe s ). From Bayes’ rule we have L(s 0 b s \9) oc Tr(9\s 0 b s )/p(9), and hence g(9\y n b s )/p(9) 
is an approximation of a function proportional to the likelihood. We can use this approx¬ 
imation to compute approximations of maximum likelihood estimates and the observed 
information matrix and hence perform frequentist analyses that can be used for comparison 
with the full Bayes analysis. It is also possible to compute marginal likelihoods for subsets 
of parameters after integrating out the other parameters according to the conditional prior. 


Grazian and Liseo (2015) recently considered the use of ABC for this purpose based on 


kernel esitmation of the ABC marginal posterior. If the parameter of interest is of moder¬ 
ate dimension it may be difficult to implement kernel estimation however. For our copula 
method the idea is illustrated in column 5 of Figure [3j where the open circles denote the 
approximate marginal MLE for (B±, k\) - obtained by maximising gi t 2 (Bi, k\)/p{B i, k\) for 
each respective model - and the dashed crosshairs denote +/- two standard errors. 

Such analyses can often be useful for assessing whether there is conflict between the 
marginal prior and marginal likelihood. We note that in applications where approximation 
of the likelihood itself is the goal, the prior can be chosen to be whatever is convenient 
(in the case of approximation of the marginal likelihood, it is the marginal prior for the 
parameter of interest that can be so chosen). If there is prior-likelihood conflict then the 
resulting estimated likelihood may be poor, since the quality of the approximation will be 
very dependent on how well the tails of the posterior are estimated. It is an interesting 
question how best to choose the prior when the goal is likelihood approximation and we do 
not pursue this further here. 
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3.3 Robust Bayesian variable selection 

We consider the problem of Bayesian variable selection in regression, where the parameter 
of interest is a vector of binary variables indicating which covariates are to be included in 
the model for the mean response. This is a challenging problem because the parameter of 
interest is discrete: all existing ABC regression adjustment techniques are concerned with 


continuous parameters (Beaumont et al. 2002, Blum and Francois 2010 Blum et al. 2013). 


Further, the marginal adjustment strategy (Nott et al. 2014) is difficult to apply as it 
needs to be implemented for each covariate model under consideration as the marginal dis¬ 
tribution for each parameter will change conditionally on the covariates. This will rapidly 
become impractical as the number of covariates increases. As a result these methods, which 
were responsible for mitigating the ABC curse of dimensionality and obtaining performance 
competitive with the ABC copula method in the multivariate g-and-/c model analysis (Sec¬ 


tion 3.2), are not available in this setting. While there is a growing literature on ABC 


model choice (e.g. Marin et al. 2015) where multinomial regression has been used to adjust 
model probabilities, such analyses have been confined to the situation where the number 
of different models is relatively small. These methods do not extend in an obvious way to 
problems like the one we consider here where the number of models considered is large. 


We consider the US crime dataset of Ehrlich (1973) in which the response is crime rate, 
measured as the number of offenses per 100,000 population, for 47 different US states in 
1960, and there are 15 covariates, some of which are highly collinear. Choice of which 
covariates to include gives a model selection problem with 2 15 distinct models. Suppose 
that y = (yi,... ,y n ) T is the vector of responses and X is the n x 16 design matrix (with 
the first column containing ones and the remaining columns containing the centred and 
standardized covariates). Write 7 = (71, ... ,7i5) T as a vector of binary indicators, where 
7i = 1 means that covariate i is included in the model and 7* = 0 otherwise (the intercept 
is always included), and define X 7 to be the corresponding design matrix containing only 
those covariates included in the model as indicated by 7. 

We consider the linear model 

y = X 7( 0 7 + e, 

where /3 7 is the vector of regression coefficients in model 7 (similarly considered as a sub¬ 
vector of the full model coefficients /3 = (/3q, /3±,..., /3is) t ) and e ~ N n ( 0, a 2 I n ) is a vector 
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of independent zero mean normal residuals with variance cr 2 . We follow a common prior 


specification for this framework (e.g. Kohn et al. 2001) and set a beta-binomial prior on 
the number of active covariates in the model i.e. P(yj = l|p 7 ) = p 7 independently for each 


i, and p 7 ~ Beta(a, b) with a = 2, b = 10. We adopt the g-prior of Zellner (1986) so that 
/ 3 7 | 7 ,<t 2 ~ 1V(0, ncr 2 (X^ r X 7 ) _1 ) and assume that a 2 ~ InverseGamma^o-, b a ), with a a = 5, 
= 5 X 200 2 which is a fairly diffuse prior centred on a reasonable prior guess for the 
residual standard deviation. With these priors (/3 7 , cr 2 ) can be integrated out of the model 


(e.g. Kohn et al. 2001) to give the marginal posterior vr( 7 |y) oc L(y\'y)p('y) with 


/ n \ ~ ( a<T “*“f) 

L(y | 7 ) <x (n + l )-^/ 2 \ 2b a + y T y - ——y^X^X^X^y) 

where g 7 denotes the number of columns of X 7 . For the US crime dataset, the number of 
predictors is small enough to permit enumeration of all posterior probability of all models 
(2 15 = 32, 768). The ten highest posterior probability models for this data set are listed in 
Table [3] (column 1). 


Exact probability 
(no outlier) 

Standard ABC 
(no outlier) 

Copula ABC 
(no outlier) 

Exact 

(with outlier) 

Copula ABC 
(with outlier) 

X3,X4,X 13 

- 

/ 

- 

/ 

Xl,X 3 ,X4,Xi3 

- 

/ 

- 

/ 

X3,X4,X 1 3,Xm 

- 

/ 

- 

/ 

Xi,X 3 ,X4,Xi3,Xu 

- 

/ 

- 

/ 

X 4 , XT, Xl3 

- 

- 

/ 

- 

Xl,X3,X4,Xn,Xi3,Xi4 

- 

/ 

- 

/ 

X4,X\3 

— 

- 

- 

- 

Xi,X 3 ,X4,Xi4,Xi3 

- 

/ 

- 

/ 

X4,X7,Xi3,Xi4 

- 

- 

- 

- 

^3 5 ^5, 

- 

- 

- 

- 


Table 3: Ten highest posterior probability models for the US crime dataset (column 1). 
Checkmarks (/) indicate those models also selected in the top 10 based on standard ABC 
and a copula ABC posterior approximation. Analyses are repeated with the dataset modi¬ 
fied to include an influential outlier. 

So far ABC methods have played no role in this analysis since the marginal likelihood 
for 7 is directly computable. However, we may use ABC to compute an approximation to 
the posterior distribution, 7 r( 7 |s) oc L(s| 7 )p( 7 ), which is conditional on a summary statistic 
s , constructed so that it’s distribution is insensitive to violations of the model assumptions 
in the full data model 7 r( 7 |y). In particular, we select the summary statistics s to produce 
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robust point estimates of /3 7 , which leads to a Bayesian variable selection framework which is 
insensitive to outliers. In general the sampling distribution of the robust summary statistic is 
intractable, even though the likelihood for the full data y is tractable, and so ABC methods 
are needed. For more detailed discussion of the benefits of using insufficient statistics in 


order to robustify Bayesian analyses see e.g. Lewis et al. (2014) 


In order to estimate L(7|s) via the copula approach, we first estimate 7r(7j|s) and 
7 r (7j,7j|s) using ABC for each parameter i and parameter pair (i,j). Parameter specific 
summary statistics are constructed as s m = Tu, the robust partial f-statistic for significance 
of covariate i in the full model (computed using the robust regression method implemented 


in the lmrob function in the R package robustbase (Rousseeuw et al. 2015) with the 


Roller and Stahel (2011) method). In addition, we fit a reduced model including the co¬ 


variates xi, X3, X4, xn, X13 and 274 - a “good” reduced model for the observed data which 
contains only one covariate from any pair of covariates that are highly correlated. Then for 
i G G = {1,3,4,11,13,14} the robust partial t-statistic for the corresponding variable in 
this reduced model, T^, is added to the summary statistic vector that is informative for 7 j. 
That is, S(j) = (Tii,T2i) for i e G and s^ = Tu otherwise. As before, we construct suj\, 
the vector of statistics informative for (77 7 j), as the union of the marginally informative 
vectors and srjy 

The final estimates of 7r(7*|s) and 7r(7i, 7^|s) are determined via ABC, by generating 
N = 100,000 samples (7 ^\s^) oc T(s|7)p(7) from the prior predictive distribution, and 
retaining the n = n! = 500 samples closest to the observed summary statistics using Eu¬ 
clidean distance and a uniform smoothing kernel Kh(-). The frequency of 7* = 1 within 
these 500 samples provides an estimate of Pi'fi = l|s) and hence an estimate 7r(7jl s (i)) of 
7r(7j|s). Similar estimates 7r(ji, can be obtained for the bivariate posterior distri¬ 

bution 7r(ji, 7j|s). 

In the discrete setting, a Gaussian copula model for 7 is defined via a latent Gaussian 
variable Z = [Z 1 ,... , Z^) T ~ 1V(0, A) where A is a correlation matrix. By setting j- = 
I(Zi > 4> _1 (pj)), where /(•) is the indicator function and pi = 7r(7j = 0|s(j)), for i = 
1,..., 15, then the marginal distribution of j- is that of vr(7i)- The correlation matrix 
A can similarly be chosen so that the joint distribution of (7^,7 '•) is that of 


20 







In particular, A^ is chosen so that 

roo roc 

/ / 4>(zi,Zj-,A i:j )dzidzj = 7r(7i = 1 , 7 , = l|s ( ju), 

where the solution for A ij of this nonlinear equation can be obtained numerically. Once the 
copula parameters have been estimated, joint posterior model probabilities 7r(7|s) for any 
desired value of 7 can be estimated via the copula approximation. 

The middle column of Table [3] indicates which of the ten highest (exact) posterior model 
probability models conditional on the full data y, are also among the ten highest posterior 
model probabilities under the ABC copula approximation. Six out of the exact top ten 
models are correctly identified as being in the top ten using the copula ABC approach. In 
contrast, when performing standard ABC using the full 21-dimensional vector of summary 
statistics (i.e. constructed as the union of the statistics in sm,..., S(i 5 )), none of the exact 
top ten models are identified. In fact, the top ten models under standard ABC consist of the 
null model, and 9 models with a single predictor. As these top posterior models effectively 
coincide with the top models a priori, as the beta-binomial prior essentially favours models 
with fewer predictors, this indicates that the standard ABC posterior approximation is very 
poor, particularly in comparison with the copula ABC approximation. 

As the original motivation for using ABC was to obtain a method for robust regression, 
we now modify one of the observations so that it is an extreme outlier. In particular, for the 
original dataset we modify the last response value by increasing its residual standard error 
estimate (based on the lmrob fit for the full model) by a factor of 10. The last two columns 
in Table [3] indicate which of the original exact ten highest posterior probability models are 
still among the ten highest posterior probability models when using the modified dataset. 
For the exact model probabilities, conditioning on y, the non-robustness of the regression 
model to outliers is apparent as only one of the original models are still among the ten 
highest exact posterior probability models. However, for the robust ABC estimates of the 
model probabilities, both with and without outliers, the same 6 models remain in common 
with the ten best models in the exact analysis without outliers. Clearly, the copula ABC 
method conditioning on robust summary statistics seems useful for finding a set of good high 
posterior probability models in datasets which might be contaminated by a small number 
of outliers. Equally clearly, standard ABC methods are not useful for this purpose. 


21 


4 Discussion 


The standard construction of ABC methods, based on conditional kernel density estima¬ 
tion, means that they do not extend well to high dimensional analyses due to a curse of 
dimensionality on the vector of summary statistics, s. The copula approach introduced 
in this paper constructs a Gaussian copula approximation to the full ABC posterior dis¬ 
tribution. In this manner, the need to simultaneously match a high-dimensional vector of 
simulated and observed summary statistics is circumvented in favour of separately match¬ 
ing many low-dimensional vectors to form the copula approximation. The fitted copula is 
not always appropriate to approximate certain highly complex posterior distributions, as 
it assumes a Gaussian dependence structure (i.e. based on bivariate linear correlations), 
albeit with flexible univariate marginal distributions. This means that non-linear depen¬ 
dencies, or complex higher-order relationships between three or more parameters in the full 
posterior vr(0|s o fe s ) will not be accurately captured. However, copula ABC may be adequate 
in many modelling situations, especially those where an accurately fitted Gaussian copula 
approximation to a highly complex posterior may be more practically useful than a very 


poor standard ABC approximation to the joint model (see e.g. Sections 3.1 and 3.3). The 
copula structure will also become a more appropriate approximation to the true posterior 
as the sample size increases, and the true posterior approaches normality. As such, cop¬ 
ula ABC is a useful and viable general technique for directly extending ABC modelling to 
high-dimensional problems. 

One point of practical consideration for copula ABC is the requirement to select 
and s/ij\ i.e. those subsets of s that are informative for and ( 0i,0j ). In principle, this 
could take the same amount of work in identifying the vector s that is informative for 
9, but repeated many times, over each univariate and bivariate posterior margin. The 


semi-automatic work of Fearnhead and Prangle (2012) is useful here, in that it provides a 
principled way of identifying linear combinations of the elements of a vector of summary 
statistics that are informative for a subset of parameters (they are in fact, Bayes linear 


estimates of those parameters; Nott et al. 2012). While it should be noted that these semi¬ 
automatic statistics are only optimal for the posterior mean, rather than any measure on the 
joint distribution, they have been successfully implemented in a large range of applications. 
Beyond this, the analyst can alternatively make use of knowledge of the structure of the 
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model in order to identify informative subsets of s. We used this approach (in combination 
with the semi-automatic approach) with each of our analyses in Section [3j although an 
alternative would have been to use the semi-automatic approach directly for each bivariate 
margin. In general, the principled identification of summary statistics for ABC methods 


remains a challenging practical problem (e.g. see Blum et al. 2013). 

As well as improving estimation of the posterior dependence structure, copula ABC 
may also be very valuable because it provides an approximate analytic expression for the 
posterior density. As previously discussed, this can be used to build a likelihood approxi¬ 
mation, and permit frequentist analyses that can serve as a reference for comparison with 
a Bayesian analysis. Approximation of likelihood functions can also be important in the 
context of setting informative priors in fully Bayesian analyses, for example in the so-called 


power prior approach (Ibrahim and Chen 2000). Here, a tempered version of the likelihood 
for past, indirect data z is used to set the prior for the analysis of the current data, y. Even 
when the likelihood for the data y is tractable, our knowledge of the past data z might be 
limited to summaries for which the corresponding likelihood is not tractable. Our copula 
ABC approach would then provide a way to make the required likelihood approximations 
for the past data in this situation. 
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Figure 3: Contour plots of the (Bi,k\) margin of various ABC posterior approximations to the 
multivariate g-and-k model, 7r(0|s o f>s). Rows correspond to the q = 3 (top), q = 10 and q = 16 (bot¬ 
tom) dimensional model, which have p = 15, p = 85 and p = 184 parameters respectively. Standard 
ABC approximations consist of (column 1) rejection sampling with regression adjustment, (column 
2) rejection sampling with marginal adjustment, and (column 3) rejection sampling, with regression 
and marginal adjustment. Column 4 illustrates the regression and marginal adjusted kernel den¬ 
sity estimate g(Bi,ki) of n(9i, Ojlsuj)), whereas column 5 shows the corresponding regression and 
marginal adjusted copula ABC approximation g(B i,fci). The dot in each panel indicates the value 
of 9q used to estimate £o i n the Mahalanobis distance calculation. The crosshairs in column 5 show 
the marginal MLE plus or minus approximately two posterior standard deviations. 


27 






















